LASSO-derived model for the prediction of bleeding in aspirin users

Aspirin is widely used for both primary and secondary prevention of panvascular diseases, such as stroke and coronary heart disease (CHD). The optimal balance between reducing panvascular disease events and the potential increase in bleeding risk remains unclear. This study aimed to develop a predictive model specifically designed to assess bleeding risk in individuals using aspirin. A total of 58,415 individuals treated with aspirin were included in this study. Detailed data regarding patient demographics, clinical characteristics, comorbidities, medical history, and laboratory test results were collected from the Affiliated Dongyang Hospital of Wenzhou Medical University. The patients were randomly divided into two groups at a ratio of 7:3. The larger group was used for model development, while the smaller group was used for internal validation. To develop the prediction model, we employed least absolute shrinkage and selection operator (LASSO) regression followed by multivariate logistic regression. The performance of the model was assessed through metrics such as the area under the receiver operating characteristic (ROC) curve (AUC), calibration curves, and decision curve analysis (DCA). The LASSO-derived model employed in this study incorporated six variables, namely, sex, operation, previous bleeding, hemoglobin, platelet count, and cerebral infarction. It demonstrated excellent performance at predicting bleeding risk among aspirin users, with a high AUC of 0.866 (95% CI 0.857–0.874) in the training dataset and 0.861 (95% CI 0.848–0.875) in the test dataset. At a cutoff value of 0.047, the model achieved moderate sensitivity (83.0%) and specificity (73.9%). The calibration curve analysis revealed that the nomogram closely approximated the ideal curve, indicating good calibration. The DCA curve demonstrated a favorable clinical net benefit associated with the nomogram model. Our developed LASSO-derived predictive model has potential as an alternative tool for predicting bleeding in clinical settings.

need for a highly accurate clinical model capable of effectively adjusting the type and duration of aspirin use to minimize ischemic risk while avoiding an increase in bleeding risk.Each score has its own strengths and limitations, which depend on the characteristics of the patient cohorts used for development and validation, rendering them applicable only to specific patients, clinical contexts, and timeframes 16 .No risk score has achieved perfect predictive performance.Having an enhanced decision-making tool for evaluating bleeding risk in individuals would facilitate informed discussions with aspirin users.
To better predict bleeding risk in aspirin users, we developed and validated a novel risk prediction tool in this study.

Study population
Participants in this study were recruited from the Affiliated Dongyang Hospital of Wenzhou Medical University.The inclusion criterion for participants was documented use of aspirin recorded in the hospital's electronic medical records (EMRs) between January 2008 and December 2017.The exclusion criteria were missing aspirin data and a lack of relevant bleeding records.The study protocol received ethical approval from the Ethics Committee of the Affiliated Dongyang Hospital of Wenzhou Medical University (approval #2023-YX-408), and waived the requirement for the written informed consent of the patients All patient medical information was anonymized and deidentified before the analysis.This research involving human participants was conducted in accordance with the principles of the Declaration of Helsinki.

Outcome definition
This study examined the occurrence of various types of bleeding, such as cerebral hemorrhage, gastrointestinal bleeding, mucosal bleeding, and other commonly observed bleeding events, within a 5-year period after the administration of aspirin.These incidents were identified through recorded data in the hospital's discharge EMRs.For the purpose of this study, the presence of any bleeding was classified as a positive outcome, while the absence of bleeding was considered a negative outcome.

Risk factors
We extracted the following information from our hospital's EMRs of the study participants: sex, age, height, weight, BMI, and past medical history, including smoking status, alcohol consumption status, diabetes status, hypertension status, surgical history, previous bleeding episodes, presence of tumors, acute myocardial infarction, percutaneous coronary intervention (PCI), presence of gastric ulcers, use of gastric protective drugs, presence of cerebral infarction, portal hypertension, anticoagulant usage, and various clinical test indicators, such as cardiac ejection fraction (EF), white blood cell count (WBC), platelet count (PLT), peripheral hemoglobin (HGB), and glomerular filtration rate (GFR).For the research parameters, we considered the lowest values of the clinical test indicators within one month before the initiation of aspirin.Other past medical histories were recorded if they occurred before the commencement of aspirin therapy.

Data preprocessing
The data extracted from the clinical research big data platform underwent thorough cleaning procedures, which involved removing extreme values and imputing missing values.Indicators with missing values for over 20% of participants, such as height, weight, BMI, EF, and GFR, were excluded from the analysis.For the remaining predictor variables with missing values, multiple imputation techniques were applied.To train and evaluate the model, the data were divided into a training set comprising 70% of the data and a test set containing the remaining portion.The classification model was trained using the training set, while the performance of the model was assessed using the test set.

Model building
The least absolute shrinkage and selection operator (LASSO) regression technique was employed to identify the most relevant predictive features 17 .The significant features identified through LASSO analysis were further subjected to stepwise backward multivariate logistic regression analysis.Based on this model, a nomogram was developed for predicting bleeding outcomes.

Model evaluation
The sensitivity and specificity of the model were evaluated as the area under the receiver operating characteristic (ROC) curve (AUC), which assesses a model's discrimination performance.Calibration curves were analyzed to evaluate the agreement between the predicted and observed probabilities.The clinical effectiveness of the identified risk factors in predicting bleeding risk was verified through decision curve analysis (DCA), which considered the net benefit across different risk thresholds for patients.Moreover, the model's discriminatory performance was validated by comparing it with that of individual indicators.Figure 1 illustrates the flowchart depicting the process of model development and validation.

Statistical methods
Statistical analysis and data visualization were performed using R4.2.1 software for Windows.Categorical variables are presented as n (%) and were compared using the χ 2 test or Fisher's exact test.Continuous variables are reported as mean ± standard deviation or median (interquartile range) and were compared using either Student's t test or the Mann-Whitney U test.Multiple imputation techniques were implemented using the "mice" package.
Baseline description and difference analysis were performed with the "comparegroups" package.LASSO regression was conducted using the "glmnet" package, while multivariable logistic regression was performed using the 'glm' function.Discrimination analysis was carried out using the "pROC," "ggROC," and "fbroc" packages.Calibration was assessed using the "rms" and "riskregression" packages.Decision curve analysis (DCA) was conducted using the 'rmda' package.The nomogram was created using the 'rms' package.Comparisons of multiple models for ROC analysis were conducted using the "ROCR" package.Diagnostic evaluation was performed with the "reportROC" package.All statistical tests were two-sided, with P < 0.05 considered statistically significant.

Selected predictors and construction model
Following LASSO regression with tenfold cross-validation, six variables (sex, operation, previous bleeding, HGB, PLT, and cerebral infarction) were chosen for inclusion in the model based on the criteria of "family = binomial" and lambda.1se.The processes of variable shrinkage and cross-validation are depicted in Fig. 2A and B, respectively.Through multivariate logistic regression using the backward inclusion process, all six variables were retained in the final models (Table 3).

Model visualization
The nomogram presented in Fig. 3 provides a visual tool for predicting the risk of bleeding in individuals using aspirin based on the results of logistic regression analysis.By identifying the value of each risk factor along the corresponding vertical line, the corresponding points can be determined.The total points are calculated by summing the points of the six risk factors.To predict the bleeding risk for a specific aspirin user, a vertical line is drawn from the total points axis, intersecting with the corresponding probability on the nomogram.For instance, if a female aspirin user has not undergone surgery, has no history of cerebral infarction, has experienced previous bleeding, and has a platelet count of 200 × 10 9 /L and an HGB level of 100 g/L, the total score would be 153.A vertical line drawn from the total score of 153 intersects the probability axis at 0.496, indicating an estimated probability of bleeding of 49.6%.

Model validation
To evaluate the discriminative ability of our model, we calculated the AUC of the ROC curve.Figure 4A shows that the AUC for the training dataset was 0.866 (95% CI 0.857-0.874),while Fig. 4B shows that the AUC for the test dataset was 0.861 (95% CI 0.848-0.875).By using a cutoff value of 0.047, the model achieved moderate sensitivity (83.0%), moderate specificity (73.9%), and a high negative predictive value (98.6%).The calibration curves, as shown in Fig. 5A and B, demonstrated excellent agreement between the predicted probability of bleeding and the actual observations in both the training and test sets.The Hosmer-Lemeshow goodness-of-fit (GOF) test also supported good consistency (p = 0.089).Figure 6 illustrates the results of decision curve analysis (DCA) for our developed model.The DCA results indicated a favorable net benefit in predicting bleeding risk among aspirin users.The threshold probability ranges were 1.0-72% for the training dataset (Fig. 6A) and 1.0-68% for the test dataset (Fig. 6B).

Model comparison with a single indicator
We assessed the discriminative capability of our developed model (nomogram) compared to that of different single indicators.Figure 7 clearly shows that our model outperformed the single indicators in terms of discriminative ability.

Discussion
In this study, we developed and validated a LASSO-derived model to evaluate the risk of bleeding in individuals taking aspirin.The final model included six significant predictors: sex, past surgery, previous bleeding, HGB, PLT, and cerebral infarction.Our model exhibited excellent discrimination, calibration, and net clinical benefit.The nomogram provides an intuitive graphical representation of the results, serving as a valuable tool for clinicians to estimate bleeding risk in aspirin users.
Various factors have been reported to affect bleeding in individuals using aspirin, including HGB 18 , PLT 19,20 , previous bleeding 18,21 , cerebral infarction 22,23 , sex 24 , and past surgery 25 .Our objective was to incorporate this comprehensive information into our risk models for predicting bleeding.Previous studies have emphasized the significance of lower HGB as a strong predictor of major bleeding 26 , as it has been linked to higher bleeding rates 27 .Our study reaffirmed the critical role of HGB as an indicator for assessing bleeding risk.We also identified PLT count as another important predictor, where lower PLT was associated with higher bleeding risk and poorer prognosis 28 .Our predictive model aligns with the findings of previous studies.A history of bleeding has been established as a crucial factor in guiding treatment plans for aspirin users 18      been associated with a heightened risk of hemorrhage transformation 29 , which is consistent with our results and may be attributed to increased use of antithrombotic medications.Our finding that male sex was independently associated with higher bleeding risk is also consistent with earlier research 30 .In addition to the aforementioned factors, a history of surgery has been recognized as a potential predictor of bleeding 31 , and our findings support the significant association between surgical history and elevated bleeding risk.In clinical practice, taking aspirin in conjunction with anticoagulants or undergoing dual antiplatelet therapy (DAPT) has a synergistic effect that greatly increases the risk of bleeding events.However, our prediction model does not include markers for anticoagulants or DAPT.This omission may be attributed to the fact that in our model, previous bleeding serves as a crucial indicator.Furthermore, medical professionals exercise greater caution when considering the use of DAPT and anticoagulants in patients with a history of previous bleeding.Tailored management strategies are essential for aspirin users with different levels of bleeding risk.High-risk patients require a comprehensive evaluation and optimization of their condition before prescribing aspirin.The development of an accurate predictive model for bleeding in aspirin users has significant clinical implications.Such a model can aid in managing comorbidities, optimizing blood parameters (e.g., HGB, PLT), considering clinical features (e.g., age, sex), addressing underlying diseases (e.g., operation, previous bleeding), and evaluating the patient's medications to improve coagulation function.The model's results are presented in a user-friendly nomogram, letting clinicians easily estimate individual bleeding risk and make personalized treatment decisions.Despite the notable findings of our study, it is important to acknowledge certain limitations.First, the retrospective design of our study prevents us from establishing causal relationships.Further validation through prospective studies is warranted to confirm the predictions of the model developed here.Second, the presence of missing data for certain variables may introduce bias; however, we addressed this bias by excluding variables with substantial missing information (> 20% of patients).Moreover, our study specifically focused on aspirin users, and the generalizability of our findings to other patient populations requires additional investigation.
Overall, our model provides valuable clinical insights and supports decision-making in managing bleeding among aspirin users, potentially optimizing treatment planning, improving patient outcomes, and enhancing resource allocation.Future research should externally validate our model in diverse patient cohorts to ensure its reliability and effectiveness in routine clinical practice.

Figure 2 .
Figure 2. Variable selection was performed using LASSO regression.(A) Coefficient profile plots were generated by plotting the coefficients against the log(lambda) sequence.This visualization facilitated the variable selection process and enabled identification of the nonzero coefficient variables based on the optimal lambda value.(B) The optimal values, determined using the 1 standard error of the minimum criterion (lambda.1se),are represented by dotted vertical lines in the plots.

Figure 3 .
Figure 3.A nomogram based on the combination of six indicators was developed.If a patient's total score is 153, the corresponding probability of bleeding is 0.496.HGB hemoglobin, PLT platelet count.

Figure 4 .
Figure 4. Receiver operating characteristic (ROC) curves of the model for distinguishing bleeding from nonbleeding patients.(A) Training set; (B) validation set.At a cutoff value of 0.047, the model achieved moderate sensitivity (83.0%), moderate specificity (73.9%), and a high negative predictive value (98.6%).

Figure 5 .
Figure 5. Calibration curves of the model.(A) Training set; (B) validation set.The y-axis represents the actual diagnosed cases of bleeding, while the x-axis represents the predicted risk of bleeding.Diagonal dotted lines were used to depict perfect predictions by an ideal model (gray line), while the black line represents the performance of the training set (left) and validation set (right).A closer alignment between the black lines and gray lines indicated better prediction performance.

Figure 6 .Figure 7 .
Figure 6.Decision curve analysis (DCA) of the model.(A) in the training set; (B) in the validation set.The y-axis of the graph represents the net benefit, while the horizontal lines labeled "None" indicate the assumption that no participant had bleeding.The lines labeled "All" represent the assumption that all participants had bleeding.The red lines correspond to the predictive model developed in this study.